First, colocalize the Darwin data with the HOT data.

## Set a location for the plots
outputdir = file.path("~", "depthplots")
dir.create(outputdir, recursive = TRUE)

## Colocalize the data
initialize_cmap(cmap_key = "5e05c500-d68d-11e9-9d3b-4f83fcec4710")
## $base_url
## [1] "https://simonscmap.com"
## 
## $route
## [1] "/api/data/sp?"
## 
## $api_key
## [1] "Api-Key 5e05c500-d68d-11e9-9d3b-4f83fcec4710"
df = get_metadata('tblHOT_Bottle', 'HPLC_chl3_bottle_hot')
## dat = compile(sourceTable='tblHOT_Bottle',
##               sourceVar='HPLC_chla_bottle_hot',
##               targetTables="tblDarwin_Ecosystem",
##               targetVars="CHL",
##               dt1="2013-03-03",
##               dt2="2015-04-03",
##               ## lat1=21.847,
##               ## lat2=22.75,
##               ## lon1=-158.363,
##               ## lon2=-157.9,
##               lat1 = df$Lat_Min,
##               lat2 = df$Lat_Max,
##               lon1 = df$Lon_Min,
##               lon2 = df$Lon_Max,
##               ## depth1=0.3,
##               ## depth2=4813.8,
##               depth1 = df$Depth_Min,
##               depth2 = df$Depth_Max,
##               temporalTolerance=c(3),
##               latTolerance=c(1),
##               lonTolerance=c(1),
##               depthTolerance=c(5))
## saveRDS(dat, file= file.path(outputdir, "dat.RDS"))
dat = readRDS(file= file.path(outputdir, "dat.RDS"))

## See a summary of this data.
dat %>% summary() %>% print()
##       time                          lat             lon             depth        HPLC_chla_bottle_hot HPLC_chla_bottle_hot_std
##  Min.   :2013-03-06 00:00:00   Min.   :21.85   Min.   :-158.4   Min.   :   1.7   Min.   :  8.0        Min.   :0               
##  1st Qu.:2013-10-01 00:00:00   1st Qu.:22.75   1st Qu.:-158.0   1st Qu.:  74.4   1st Qu.: 93.5        1st Qu.:0               
##  Median :2014-02-14 00:00:00   Median :22.75   Median :-158.0   Median : 150.2   Median :146.0        Median :0               
##  Mean   :2014-02-28 15:20:45   Mean   :22.68   Mean   :-158.0   Mean   : 558.8   Mean   :162.3        Mean   :0               
##  3rd Qu.:2014-09-16 00:00:00   3rd Qu.:22.75   3rd Qu.:-158.0   3rd Qu.: 549.9   3rd Qu.:215.5        3rd Qu.:0               
##  Max.   :2015-03-30 00:00:00   Max.   :22.75   Max.   :-157.9   Max.   :4810.5   Max.   :535.0        Max.   :0               
##                                                                                  NA's   :5735         NA's   :5743            
##       CHL            CHL_std      
##  Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0193   1st Qu.:0.0029  
##  Median :0.0284   Median :0.0064  
##  Mean   :0.0497   Mean   :0.0122  
##  3rd Qu.:0.0711   3rd Qu.:0.0186  
##  Max.   :0.2122   Max.   :0.0772  
##  NA's   :2322     NA's   :2322
dat %>% dim() %>% print()
## [1] 5982    8

Now, make plots of CHL from the two data sources, over time:

## Make plots for each
for(ii in 1:88){
  onetime = dat$time %>% unique() %>% .[ii]
  onedat = dat  %>% select(lat, lon, time, CHL, depth, HPLC_chla_bottle_hot)
  
  ## Also isolate to a single latitude
  onedat = onedat %>% filter(time == onetime)
  onedat = onedat %>% arrange(depth)

  if(nrow(onedat) == 0) next
  ## Make three plots
  ## png(file = file.path(outputdir, paste0("depthplot-", ii, ".png")),
  ##                      width = 1200, height = 500)
  par(mfrow=c(1,3), cex=1.2, oma = c(1,1,4,1))
  if(any(!is.na(onedat$CHL))){
    plot(y = onedat$CHL, x = onedat$depth,
         type='p', cex = 1.5, log="x", pch = 16, xlab = "Depth", ylab = "Darwin CHL", col='grey50')
    mtext(onetime, side = 3, outer=TRUE, cex=2)
  } else {
    plot(NA, ylim=c(0,1))
  }
  title(main="Darwin Chl-A   vs   depth")
  if(any(!is.na(onedat$HPLC_chla_bottle_hot))){
    plot(y = onedat$HPLC_chla_bottle_hot,
         x = onedat$depth,
         type='p', cex=1.5, pch=16, log='x', xlab = "Depth", ylab = "HOT CHLA", col='grey50')
    title(main="HOT Chl-A   vs   depth")
    plot(y=onedat$CHL, x=onedat$HPLC_chla_bottle_hot, pch=16, ylab = "Darwin CHL", xlab = "HOT CHLA", col='grey50', cex=1.5)
    title(main="Scatterplot between two sources")
  } else {
    plot(NA, ylim=c(0,1))
    title(main="HOT Chl-A   vs   depth")
    plot(NA, ylim=c(0,1))
    title(main="Scatterplot between two sources")
  }
  ## graphics.off()
}

Other data leads

The Darwin data is here: https://simonscmap.com/catalog/datasets/Darwin_Ecosystem

The HOT data is here: https://simonscmap.com/catalog/datasets/HOT_Bottle_ALOHA

Also a different data that also contains CHL-A data, but from a different source. https://simonscmap.com/catalog/datasets/HOT_PP

Chris Follet strongly recommends CTD data:

“My first instinct is to try and use depth profiles from Station ALOHA, focusing on the fluorescence derived Chlorophyll from the CTD drops. There is also some very high resolution ‘Wire Walker’ Data for a handful of cruises. I think that both of these data types are in CMAP. My recollection is that Aditya built an R package to deal with CMAP so this is probably easy to get/use? If this is so, there is a very clean way to start using the DARWIN model CHL which is also in CMAP https://simonscmap.com/catalog/datasets/Darwin_Ecosystem at 3 day depth resolution. (I added Stephanie to the chain in case she has thoughts here).”

“I would recommend also using the CTD data (realtime sensor dropped down the wire: chl from fluorescence, but there is also Salinity, Temperature, densitiy, and nitrate which can be compared directly to DARWIN) because it is of much higher depth resolution. I think any precision lost using fluorescence is more than gained back in the shape resolution. I am very sure this data is also in CMAP, but to make sure you actually receive an email reply from me the data website from HOT is https://hahana.soest.hawaii.edu/hot/hot-dogs/cextraction.html which should help in finding the correct table in CMAP.”

I think this is the right CTD dataset: https://simonscmap.com/catalog/datasets/HOT_CTD

Early impressions

A preliminary look at the data seems to suggest that OMD is perfect in explaining the difference. A scatterplot of Chlorophyll by depth seems to show some mass shifts along depth axis that isn’t explained well by:

  1. Scatterplots between the CHL values, or
  2. An L2 distance.

The next steps are to (1) colocalize + download CTD data, and (2) visual comparisons (3) then, apply OMD.